Talk overview


Discipline background


Pedagogic challenges

Who am i

Part 1: Urban systems science ?

What do we mean

Urban systems

A set of towns and cities \[*or **functions** within cities*\] that can be considered linked together by various forms of social and economic interaction

Source: Oxford reference

Systems thinking

Methods aimed at studying a system through its collective behavioral features

Source: Cristiano et al. 2020

Tools for Systems Thinkers: The 6 Fundamental Concepts of Systems Thinking

Science of cities

The science of cities – using evidence to understand how cities work – is forever expanding

Source: UK Government

Urban science

Urban science is an interdisciplinary field that studies diverse urban issues and problems

Source: Wikipedia

Urban systems science

Urban systems science



Urban Systems: Cities \[or functions within cities\] that can be considered linked together \[there is a relationship between them\]

+

Urban systems science



Urban Systems: Cities \[or functions within cities\] that can be considered linked together \[there is a relationship between them\]

+



Urban Science: Urban issues and problems

Urban systems science



Urban Systems: Cities \[or functions within cities\] that can be considered linked together \[there is a relationship between them\]

+



Urban Science: Urban issues and problems

=



Smart Cities: networks and services are made more efficient with the use of digital solutions for the benefit of its inhabitants and business.

Source: Smart Cities, European Comission

Urban system science approach:

Updated from Grolemund & Wickham's classis R4DS schematic, envisioned by Dr. Julia Lowndes for her 2019 useR! keynote talk and illustrated by Allison Horst. Source: Allison Horst data science and stats illustrations

The same as regular data science but with spatial data

An example..Urban Heat Island effect

Fremantle Woolstore, Western Australia

An example….UHI


An example….UHI

Ran 4 scenarios:

  1. Original (existing) development (from satellite imagery)
  2. Proposed redevelopment as in the plan
  3. Proposed redevelopment removing trees
  4. Proposed redevelopment with trees covering the hottest pixels

How smart are cities?


Part 2: Spatial data

What is spatial data?

  • The earth is a 3D sphere (well, almost). It’s wider than it is tall
  • In order to locate a point on the surface of a sphere, we need a set of coordinates
  • Coordinates will tell us how near to the top or bottom of the sphere we are, or how far around
  • But where do we start?

What is spatial data 2?

Geographic Coordinate Reference System

  • treats the globe as if it was a sphere divided into 360 equal parts called degrees


Projected Coordinate Reference System

  • flat, two-dimensional plane (through projecting a spheroid onto a 2D surface) giving it constant lengths, angles and areas

Coordiante reference systems

Simply

Spatial data is just like normal data except it has an extra “geometry column”

Pedagogic challenges

Part 3: Data contamination / manipulation ?

Data contamination / manipulation


Ok, what about geographic data

Who has made our boundary data?

Who has made manipulated our boundary data?

Who has made our boundary data?

Redlining

  • 1930s – American Home Owner’s Loan Corporation – prevent missed payments…residential security maps based on race
    • People abandon areas
    • Can’t refinance
    • Less property tax for services
    • Social equity issues remain
    • 1968 Fair Housing Act

Los Angeles Redlining

Who has made our boundary data?

Gerrymandering

Every 10 years electoral districts are re-drawn “redistricting”– Thomas Hofeller (republican) = PACK and CRACK

  • PACK = put all the democrat voters in 1 district
  • CRACK = sprinkle them out so they never have majority

Gerrymandering

“Redistricting is democracy at work” - Tom Hofeller

Pedagogic challenges

Part 4: Big Data

Big data

Big geospatial data include datasets that are too large to be processed using traditional GIS tools

Source: GIS Harvard

Why are they large?

Raster

Vector

What can we do about it?

Parquet files

  • We are moving from row based storage to column based

  • About 50x faster than a .csv

  • It groups our data.

    • For example a row group size of 2, puts rows all the data from 1 and 2 next to each other then we have 3! = GROUPS or PARTITION

    • If we have large data this means we can skip groups we don’t need

Demystifying the Parquet File Format

New York City Taxi and Limousine Commission (TLC) all records from Yellow and Green Cabs

Concepts

 

We can go faster!


DuckDB

to_duckdb() 
to_arrow()

Regarding performance, parquet is 717 times faster than the same query on a csv file, and duckdb is 2808 times faster.

Source: Christophe Nicault

Notes

Postgres

Postgres = object-relational database

DVD Rental Model

PostgreSQL has a PostGIS extention

This allows the “geometry” column and spatial quieres

Making random points in polygons

5 million random points

  • QGIS = 226 seconds
  • PostGIS = 18 seconds

Source: Why should you care about PostGIS? — A gentle introduction to spatial databases

PostGIS


Starting

Pedagogic challenges

Part 5: Reproducibility

What led me here?

  • Lecture with Carl Howe

2017, 90% of the data in the world today has been created in the last two years alone, at 2.5 quintillion bytes of data a day! - IBM

Ok, what about geographic data

A shifting landscape

Paper: Opening practice: supporting reproducibility and critical spatial data science
  • A comparison of Geographical Weighted regression across:
    • 4 open software packages
    • 2 black box / commercial implementations

All of the implementations were tested with the same input data.


They all gave the same results except the ESRI/ArcGIS implementation (Li 2018)


and although ESRI provide help for the GWR tools, the actual coding is closed—the underlying code is not revealed

Part 6: Teaching criticality, data bias, reproducibility


    1. Lead by example


  • 1b. Listen to Alumni / employers


  • 1c. Learn by doing


    1. Don’t assess it, make it mandatory for the assessment*

1. Lead by example

1. Lead by example

1b. Listen to Alumni / employers

1c. Design and outputs

Learning happens by doing


Weekly homework that we dedicate time to discussing

1c. Design and output

Part 1: GIS tools…subject based learning

You need calculate the average percent of science students (in all) grades per county meeting the required standards

Part 2: GIS analysis… problem based learning

Each practical answers a question….

What are the factors that might lead to variation in Average GCSE point scores across the city?

What are we assessing?


Can students apply the tools / methods with different scenarios and data ?


Can students critique the process

2. Make it mandatory for the assessment

Part 2: GIS analysis, example practice question

New York City wish to conduct a study that aims to prevent people being evicted through understand possible related factors.You have been enlisted as a consultant and tasked to conduct an analysis of their data from 2020.

Data:

2. Make it mandatory for the assessment

DISCUSS

  • How were the evictions recorded

  • Why were there limited evictions during 2020/ then a sudden peak? - COVID ban on evictions

  • How can identifying spatially related factors to evictions be useful

  • Are there certain areas that have higher evictions than others - why might this be?

  • What assumption does the data make

  • What assumptions does the model make

2. Make it mandatory for the assessment

Students

Conclusion

Scientists must have a say in the future of cities, McPhearson 2016